A Three-Phase System for Chinese Named Entity Recognition

نویسندگان

  • Conrad Chen
  • Hsi-Jian Lee
چکیده

The handling of out-of-vocabulary (OOV) words is one of the key points to a high performance lexical analysis in natural language processing. Among all OOV words, named entities (NE) are the most productive ones. They generally constitute the most meaningful parts of sentences (persons, affairs, time, places, and objects). In this paper, we propose a three-phase “generation, filtering, and recovery” system to address the NER problem. A set of stochastic models is first used to generate all possible NE candidates. Then we treat candidate filtering as an ambiguity resolution problem. To resolve ambiguities, we adopt a maximal-matching-rule-driven lexical analyzer. Last, a pattern matching method is applied to detect and recover abnormalities in the results of the previous two phases. Pure lexical information is exploited in our system. We get a high recall of 96% with personal names (PER), satisfiable recall of 88%, 89%, and 80% with transliteration names (TRA), location names (LOC), and organization names (ORG), respectively. The overall precision and excluding rate is over 90% and 99%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

Chinese Named Entity Recognition with a Multi-Phase Model

Chinese named entity recognition is one of the difficult and challenging tasks of NLP. In this paper, we present a Chinese named entity recognition system using a multi-phase model. First, we segment the text with a character-level CRF model. Then we apply three word-level CRF models to the labeling person names, location names and organization names in the segmentation results, respectively. O...

متن کامل

تشخیص اسامی اشخاص با استفاده از تزریق کلمه‌های نامزد اسم در میدان‌های تصادفی شرطی برای زبان عربی

Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

سیستم شناسایی و طبقه‌بندی موجودیت‌های اسمی در متون زبان فارسی بر پایه شبکه عصبی

Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004